The main question we will tackle is : Do we observe the same pattern in the cell types (population level) as in the microscopic level ? And the related questions would be : Is the building block theory consistent with what we see in the macroscopic level ? Can we infer the microscopic architecture with only the cell types proportions of a given tumor without other prior info ?
To answer those questions, we will use a dataset from the article of Wagner et al., Cells (2019). In this article, scientists provide a single cell atlas from Breast cancer samples, but also some healthy tissues near the tumor. Using mass cytometry and clustering methods, they identified the types of around 26 millions of cells using 73 different markers-from markers of live cells to more specific ones such as immune checkpoints. This huge dataset allows a better identification of the landscape of tumors in breast cancer. We are going to proceed as before with a PCA trying to find the PCs that catch the most the variance of the dataset and maybe find structure in the TME of BC.
Prior to everything, we are going to select the cell types identified in this paper to keep the most relevant ones. We don’t need to be very specific but most of the selected cell types should correspond to the ones labeled in Keren et al., Cell (2018) paper.
Finally, we will confront our results to the building blocks analysis.
rm(list=ls())
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 524678 28.1 1166202 62.3 665754 35.6
## Vcells 973298 7.5 8388608 64.0 1819512 13.9
.libPaths("/scratch/anissa.el/R_old/x86_64-redhat-linux-gnu-library/4.0/R")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ade4)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(readxl)
library(reshape2)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(igraph)
##
## Attaching package: 'igraph'
##
## The following object is masked from 'package:plotly':
##
## groups
##
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
##
## The following objects are masked from 'package:purrr':
##
## compose, simplify
##
## The following object is masked from 'package:tidyr':
##
## crossing
##
## The following object is masked from 'package:tibble':
##
## as_data_frame
##
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
##
## The following object is masked from 'package:base':
##
## union
library(viridis)
## Loading required package: viridisLite
library(ggpubr)
library(ggplot2)
library(ggrepel)
#library(devtools)
library(processx)
First, we remove the older objects and set the new working directory
dirName <- dirname(rstudioapi::getSourceEditorContext()$path )
setwd(dirName)#setwd("/scratch/anissa.el/ImmuneStates/wagner_analysis")
source ("./scripts/R/Wagner_Keren_functions.r") # Some useful functions for Keren and Wagner Data processing
MetaData <- readxl::read_excel("./data/mmc2.xlsx",col_names=TRUE,skip=1)%>%dplyr::rename(Patient=`Patient ID`)
MetaData<- MetaData%>%
mutate(Status = ifelse(`ER Status by IHC`=='P' & `PR Status by IHC`=='P'& `HER2 Status by IHC`=='N',"ER+/PR+",
ifelse(`ER Status by IHC`=='P' & `PR Status by IHC`=='N'& `HER2 Status by IHC`=='N',"ER+/PR+",
ifelse(`ER Status by IHC`=='N' & `PR Status by IHC`=='P'& `HER2 Status by IHC`=='N',"ER+/PR+",
ifelse(`ER Status by IHC`=='N' & `PR Status by IHC`=='N'& `HER2 Status by IHC`=='P',"HER2+",
ifelse(`ER Status by IHC`=='N' & `PR Status by IHC`=='N'& `HER2 Status by IHC`=='N',"TN",
ifelse(`ER Status by IHC`=='N' & `PR Status by IHC`=='P'& `HER2 Status by IHC`=='P',"PR+HER2+",
ifelse(`ER Status by IHC`=='P' & `PR Status by IHC`=='P'& `HER2 Status by IHC`=='P',"HER2+","Unknown"))) )))))
#Check the tumors for which patients received a Neoadjuvant treatment ==> this may interfere with the proportions of T cells in the sample:
#NeoAdjuvant <- MetaData%>%filter(`Neoadjuvant Therapy Received` =="yes")
saveRDS(MetaData,"./data/metadataBC.rds")
This is the composition of BC samples : 54 luminal A, 71 luminal B, six luminal B-HER2+, one HER2+, and six TN tumors (144 in total).
CellsProp <- read_excel("./data/mmc5.xlsx",col_names=TRUE,skip=2)
#colnames(CellsProp)
CellsPropBC <- CellsProp%>%filter(`LiveCells [%]`>50)#CellsProp%>%filter(Tissue=="Tumor")%>%filter(`LiveCells [%]`>50)
#CellsPropBC <- left_join(CellsPropBC,MetaData,by="Patient")%>%filter(`Neoadjuvant Therapy Received` !="yes")
CellsPropBC <- CellsPropBC%>%
mutate(LiveCells_tot= `EndothelialCells [%]`+`EpithelialCells [%]`+`Fibroblasts [%]`+`ImmuneCells [%]`+`Other [%]`)
CellsPropBC <- CellsPropBC%>%
mutate(EndothelialCells=((`EndothelialCells [%]`/LiveCells_tot))*100)%>%
mutate(EpithelialCells=((`EpithelialCells [%]`/LiveCells_tot))*100)%>%
mutate(Fibroblasts=((`Fibroblasts [%]`/LiveCells_tot))*100)%>%
mutate(ImmuneCells=((`ImmuneCells [%]`/LiveCells_tot))*100)%>%
mutate(Other=((`Other [%]`/LiveCells_tot))*100)
#Correction to ImmuneCells percentage
CellsPropBC <- CellsPropBC%>%
mutate(TCells = (`TCells [%]`/100)*ImmuneCells)%>%
mutate(NK = (`NaturalKillerCells [%]`/100)*ImmuneCells)%>%
mutate(Granulocytes = (`Granulocytes [%]`/100)*ImmuneCells)%>%
mutate(BCells= (`BCells [%]`/100)*ImmuneCells)%>%
mutate(PlasmaCells = (`PlasmaCells [%]`/100)*ImmuneCells)%>%
mutate(DC = (`plasmacytoidDendriticCells [%]`/100)*ImmuneCells)%>%
mutate(MyeloidCells = (`MyeloidCells [%]`/100)*ImmuneCells)%>%
mutate(Basophils = (`Basophils [%]`/100)*ImmuneCells)
## Let's check for the proportion of T cells across the tumor samples
#ggplot(data= CellsPropBC)+
# geom_boxplot(mapping=aes(y=TCells))+
# labs(title="Boxplot of proportions of T cells across 144 tumor samples (in %)")+
# ylab("percentage")
## Correction of T cells percentage
CellsPropBC <- CellsPropBC%>%
mutate(sumTcells = `TCells CD4+ [%]`+`TCells CD8+ [%]`)%>%
mutate(OtherTcells = ((100 - sumTcells)/100)*TCells)%>%
mutate(`CD4-T` = (`TCells CD4+ [%]`/100)*TCells)%>%
mutate(`CD8-T` = (`TCells CD8+ [%]`/100)*TCells)
CellsPropBC.raw <- CellsPropBC %>%
mutate(Monocytes = (M06 + M15)*MyeloidCells)%>%
mutate(Macrophages = (M01 + M02 + M03 + M04 + M08 + M09 + M11 + M13 + M14 + M16 + M17)*MyeloidCells)%>%
mutate(OtherMyeloid = (M05 + M07 + M10 + M12 + M18 + M19)*MyeloidCells)%>%
mutate(OtherImmune = OtherMyeloid+ OtherTcells)%>%
filter(Tissue=="Tumor")%>%
dplyr::select(`Sample ID`,EndothelialCells,EpithelialCells,Fibroblasts,Other,NK,BCells,PlasmaCells,DC,`CD4-T`,`CD8-T`,Macrophages,OtherImmune,Basophils,Granulocytes,Monocytes)%>%
column_to_rownames(var = "Sample ID")
## Correction to myeloid cells percentage
CellsPropBC <- CellsPropBC %>%
mutate(Monocytes = (M06 + M15)*MyeloidCells)%>%
mutate(Macrophages = (M01 + M02 + M03 + M04 + M08 + M09 + M11 + M13 + M14 + M16 + M17)*MyeloidCells)%>%
mutate(OtherMyeloid = (M05 + M07 + M10 + M12 + M18 + M19)*MyeloidCells)%>%
mutate(OtherImmune = OtherMyeloid+ OtherTcells+Basophils+Granulocytes+Monocytes)
#mutate(CD4TCells = (T02+T03+T04+T08+T09+T13+T17+T18+T20)*50)%>%
#mutate(Tregs = T01*50)%>%
#mutate(CD8Tcells = (T05 + T06 + T07 + T10 + T11 + T12 + T14 + T15 + T16 + T19)*50)
CellsPropBC2 <- CellsPropBC%>%
filter(Tissue=="Tumor")%>%
dplyr::select(`Sample ID`,EndothelialCells,EpithelialCells,Fibroblasts,Other,NK,BCells,PlasmaCells,DC,`CD4-T`,`CD8-T`,Macrophages,OtherImmune)%>%
column_to_rownames(var = "Sample ID")
#Select metadata from BC patients
MetaData <- MetaData%>%
mutate(Patient = rownames(CellsPropBC2))
StatusTumor <- MetaData%>%
dplyr::select(Patient,Status)
AgeResection <- MetaData%>%
dplyr::select(Patient,`Age at Surgery`)
AllCellsProps <- CellsPropBC%>%
dplyr::select(`Sample ID`,EndothelialCells,EpithelialCells,Fibroblasts,Other,NK,BCells,PlasmaCells,DC,`CD4-T`,`CD8-T`,Macrophages,OtherImmune)%>%
column_to_rownames(var = "Sample ID")
## Select other healthy breast tissue (healthy patient or tissue close to the tumor)
JTCellsProps <- CellsPropBC%>%
filter(Tissue=="Juxta-tumoral")%>%
dplyr::select(`Sample ID`,EndothelialCells,EpithelialCells,Fibroblasts,Other,NK,BCells,PlasmaCells,DC,`CD4-T`,`CD8-T`,Macrophages,OtherImmune)%>%
column_to_rownames(var="Sample ID")
MCellsProps <- CellsPropBC%>%filter(Tissue=="Mammoplasty")%>%
dplyr::select(`Sample ID`,EndothelialCells,EpithelialCells,Fibroblasts,Other,NK,BCells,PlasmaCells,DC,`CD4-T`,`CD8-T`,Macrophages,OtherImmune)%>%
column_to_rownames(var="Sample ID")
ggplot(data=CellsPropBC, aes(x=Other,y=OtherImmune))+
geom_point()
correlationLeukoAdipo <- cor(CellsPropBC$OtherImmune,CellsPropBC$Other,method="pearson")
After having excluded the tumors which received a neo-adjuvant therapy (which targets dividing cells), we still have the same boxplot which confirms the sayings in the article. This still does not explain why we have an outlier in our distribution. It can be imputed either to the clustering method, either to pure hazard, either a sample which has a big tertiary lymphoid structure.
Let’s see the amount of Other Cells in this dataset. Other Cells is a cell type given for a cell that hasn’t been identified as nor immune, nor as epithelial, nor as fibroblast nor endothelial :
p1 <- ggplot(data = CellsPropBC2)+
geom_boxplot(mapping=aes(y=Other))+
labs(title="Boxplot of proportions of other cells across 144 tumor samples (in %)")+
ylab("percentage")
p1
#p2 <- ggplot(data= CellsPropBC2)+
# geom_boxplot(mapping=aes(y=Macrophages))+
# labs(title="Boxplot of proportions of Macrophages across 144 tumor samples (in %)")+
# ylab("percentage")
#p2
We can find some outliers in the boxplot. Indeed, there are tumors where we can find up to 80 % of cells that nor cancer cells nor immune cells. Those can be adipocytes in the breast tissue, but we can’t really say that we are looking at the tumor micro environment in this case. Maybe those tissues samples from patients are on the edge of the TME. The best solution is to get rid of those outliers by putting a threshold : tumors with more than 50 % of the cell composition of Other cells will be removed from our analysis.
#CellsPropBC2<- CellsPropBC2%>%filter(Other<=50)
Let’s reduce the dimension of our dataset with an unscaled PCA on the cell types proportion space.
We are also going to project the TMENs in this space to see if the plane encompasses some points. Each point will be represented a a building block : the proportions of the pure building block. For example, the TMEN3(BB3) will be exclusively composed of cancer cells, so we would have 100 % epithelial cells.
scBCPCA <- dudi.pca(df = CellsPropBC2,scale= FALSE, center=TRUE, scannf= FALSE, nf = length(colnames(CellsPropBC2))-1)
fviz_eig(scBCPCA)
#fviz_pca_biplot(scBCPCA,col.var = 'indianred2',col.ind = 'steelblue2',invisible = 'ind', repel = TRUE)
#fviz_pca_biplot(scBCPCA,axes=c(1,3),col.var = 'indianred2',col.ind = 'steelblue2',invisible = 'ind', repel = TRUE)
# Write coordinates of points in 3PCs in csv file : used for archetype analysis
write.csv(as.matrix(scBCPCA$li[,1:3]), "./data/3PCA_Wagner.csv",row.names=FALSE)
rownames(scBCPCA$co) <- set_cells_names(rownames(scBCPCA$co))
biplot12 <- fviz_pca_biplot(scBCPCA,col.var = 'darkred',col.ind = 'steelblue2',label= "var",repel=TRUE)
biplot12$labels$x <- str_replace(biplot12$labels$x,"Dim1",replacement="PC 1")
biplot12$labels$y <- str_replace(biplot12$labels$y,"Dim2",replacement="PC 2")
ggsave("./figs/biplotWagner.pdf",plot=biplot12)
## Saving 10 x 5 in image
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
biplot13 <- fviz_pca_biplot(scBCPCA,axes=c(1,3),col.var = 'indianred2',col.ind = 'steelblue2',label= "var",repel=TRUE)
biplot13$labels$x <- str_replace(biplot13$labels$x,"Dim1",replacement="PC 1")
biplot13$labels$y <- str_replace(biplot13$labels$y,"Dim3",replacement="PC 3")
#biplot12
#biplot13
## Warning: ggrepel: 9 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 11 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## png
## 2
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## png
## 2
The PCA explains most of the variance of the dataset (More than 80 %).
The first PC explains around 60 % of the variability (first
eigenvector). We can see with the biplot 3 major axes : Other cells that
are not cancer cells nor immune cells nor fibroblasts. We can see here
that Fibroblasts are not as abundant as we saw in the PCA we did after
using EPIC on TCGA data (bulk RNAseq). The axis with the majority of
epithelial cells is represented as the corner of less immune infiltrated
tumors ==> parallel with the building block 3 of only cancer cells
that are positive to KI67 and other keratin markers (panKeratin
markers).
Why do Other cells contribute a lot to the variance of this dataset ? First, we did an unscaled PCA so, the most abundant cell types might hide the effect of less abundant ones. Why do some sample are composed of more than 50 % of other cells ? Probably because of the way we collect samples of breast tumors. Usually, clinicians don’t take only the tumor,most of the time the sample is surrounded by healthy tissue and a mix of Fibroblasts and adipocytes. The samples are not always the most perfect ones and It is more probable that we might have some samples that have been cut in a place on the edge of the tumor, so It would be normal that we have a lot of adipocytes, way more than cancer/immune cells.
### Plot the tumors in the PCA space (3 first PCs that epxlain almost all of the variance of the dataset)
fig1 <- plot_ly(x = scBCPCA$li$Axis1,
y = scBCPCA$li$Axis2,
z = scBCPCA$li$Axis3,
#color =~CellsPropBC2$Other,
showlegend = TRUE,
type = "scatter3d", mode = "markers",
marker = list(symbol = "triangle", size = 6, color = toRGB("steelblue", .7)),
mode = "text",
name = "tumor")
#fig1 <- fig1 %>% add_trace(x=as.vector(JTissue_proj[,1]),
# y=as.vector(JTissue_proj[,2]),
# z=as.vector(JTissue_proj[,3]),
# type="scatter3d",mode="markers",
# showlegend=TRUE,
# name="Juxta-tumoral tissue",
# marker=list(color="mediumseagreen",symbol="star-diamond",size=6),
# inherit = FALSE)
#fig1 <- fig1 %>% add_trace(x=as.vector(MTissue_proj[,1]),
# y = as.vector(MTissue_proj[,2]),
# z = as.vector(MTissue_proj[,3]),
# type = "scatter3d",mode = "markers+text",
# text = "healthy tissue",
# showlegend = TRUE,
# name = "Mammary healthy tissue",
# marker = list(color="indianred2",symbol="star-diamond",size=6),
# inherit = FALSE)
fig1 <- fig1 %>% layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3") ),
title = "PCA on cell proportions space of BC tumors")
fig1
import sys
sys.path.insert(1, '../TMENS_analysis/src/utils')
from archetypes import ArchetypalAnalysis
import pandas as pd
import numpy as np
## FIXME:A print command in the archetypal analysis ==> remove it !!
# Reading the file of PCA from Wagner dataset
pcData = pd.read_csv("./data/3PCA_Wagner.csv")
pc3dWagner = np.array(pcData, dtype = "float64")
AA = ArchetypalAnalysis(n_archetypes = 4,
tolerance = 0.001,
max_iter = 200,
random_state = 0,
C = 0.0001,
initialize = 'random',
redundancy_try = 30)
AA.fit_transform(pc3dWagner)
## array([[-2.11447582e+00, 1.22262213e+01, -3.81950903e+00],
## [-2.46313754e+00, 1.25552925e+01, -4.75035478e+00],
## [-5.08984350e+01, -3.44545433e+00, -5.37733959e+00],
## [-2.71959301e+01, -5.54146656e-02, 3.25209267e+00],
## [ 4.73061628e+01, -4.61314399e+01, -2.92737706e+00],
## [ 2.98819184e+01, -3.01211542e+01, -1.51664187e+00],
## [ 2.67726696e+01, 5.02191678e+00, -8.62939498e+00],
## [-3.69406103e+01, -1.62094675e+01, -4.77333144e+00],
## [ 3.35065239e+01, -3.30954237e+01, -4.61396526e+00],
## [-5.35258310e+01, -5.07627582e+00, -5.46994217e+00],
## [ 2.32254048e+01, -1.63103760e+01, -3.53561404e+00],
## [-3.79032621e+01, -8.85958319e-01, -2.76437678e+00],
## [ 5.02186412e+01, -5.35719829e+01, -1.77349134e+00],
## [-2.51797266e+01, 5.74919192e+00, -8.01933080e+00],
## [-1.71737635e+01, -9.41651727e+00, -6.69073490e+00],
## [-1.81349664e+01, -1.74427501e+01, -6.30552731e-01],
## [-4.80088753e+01, -2.69944656e+00, -4.57762962e+00],
## [ 2.07220851e+01, 1.18036755e+01, 2.42061593e+01],
## [ 2.93271852e+01, 2.64751624e+01, -1.45895280e+01],
## [ 2.02414492e+01, -1.50353105e+01, 1.69676322e+00],
## [ 1.10734532e+01, 9.61189034e+00, 2.06346478e+01],
## [-2.94502974e+01, -6.90847029e+00, -4.27249034e+00],
## [-2.01525806e+01, 8.53302427e+00, -9.03741632e+00],
## [ 2.64191518e+01, -3.95883340e+01, -1.35201193e+00],
## [ 1.42247707e+01, 1.07371113e+01, -1.93365461e+00],
## [ 1.72957574e+01, -1.20857318e+01, 4.29692168e+00],
## [ 2.77635151e+01, -2.32343770e+01, 7.88676334e-01],
## [ 1.72506230e+01, -2.25475190e+01, 1.84177788e+00],
## [ 4.89637981e+01, -4.95988470e+01, -1.27266629e+00],
## [ 3.92511585e+01, -9.16928143e+00, -8.74769546e+00],
## [ 3.41957062e+01, -6.02166969e+00, -9.24740362e+00],
## [ 9.04003788e+00, 2.18424920e+00, -5.13262223e+00],
## [ 4.23320468e+00, 1.61640679e+01, -6.52704743e+00],
## [-1.84270620e+01, -1.01686018e+01, -5.82066057e+00],
## [ 3.02896769e+01, 2.48170704e+00, -8.48649008e+00],
## [ 3.44761283e+01, 3.94330032e-02, -5.78244741e+00],
## [-1.04351062e+00, -1.02140075e+00, 1.21150266e+01],
## [ 3.41630293e-01, 1.17730464e+01, -7.16073709e+00],
## [-1.30103683e+01, -2.42482006e+01, -4.32735970e+00],
## [ 7.54746667e+00, -3.52802072e+01, -3.24215058e+00],
## [ 2.28985149e+01, -2.29205543e+01, -4.34302982e-01],
## [ 3.19518794e+01, -4.07232303e+01, -1.31143486e+00],
## [ 3.15613170e+01, 2.05111731e+01, -7.24995967e+00],
## [ 3.73247297e+01, -4.66056118e+01, -2.46215830e+00],
## [ 2.54431201e+01, -3.03209185e+01, 3.83976843e+00],
## [-5.69722039e+01, -6.42077551e+00, -5.38898113e+00],
## [-1.62461796e+00, -1.92215574e+01, -1.73511413e+00],
## [ 3.59000049e+00, 1.62280654e+01, -7.18072183e+00],
## [ 2.47493931e+01, 1.61607572e+01, 1.53301237e+01],
## [ 3.65615760e+01, -1.63922654e+01, -4.70337965e+00],
## [ 3.98346909e+01, -3.52175808e+01, -4.49664948e+00],
## [ 2.06316690e+00, -2.39477226e+01, -1.93266725e+00],
## [ 1.75769765e-01, -1.68218955e+01, 1.66959267e+00],
## [-5.96820034e+01, -6.46095244e+00, -5.55604586e+00],
## [ 9.10185546e+00, 4.31260612e+00, 1.04355356e+00],
## [-6.93994417e+00, -2.33043208e+01, -1.26626137e+00],
## [-1.91526746e+01, -2.46079052e+00, -1.47050080e+00],
## [-4.37184280e+01, -6.68171866e+00, -3.33140163e+00],
## [ 1.92742424e+01, -2.10554591e+01, -9.78152213e-01],
## [-1.36355998e-01, 9.37911195e+00, -4.86903859e+00],
## [-5.99042944e+00, 7.31212265e+00, 9.57051687e+00],
## [ 1.63504021e+01, -2.41052095e+01, 4.71343671e+00],
## [ 4.45854707e-02, -1.15926244e+01, -6.98377996e+00],
## [ 3.29473106e+01, 2.81257847e+01, -1.20614938e+01],
## [-9.93010701e-02, -1.98619382e+01, -4.78863648e-01],
## [-2.04558103e+01, 3.81817975e+00, -3.26869912e+00],
## [ 3.85321745e+01, -9.94698860e+00, -1.32167878e+00],
## [-2.50521830e+01, 7.72110651e-01, 6.93815433e+00],
## [ 3.79294821e+01, 5.86157350e+00, -9.92236004e+00],
## [ 2.88295242e+01, -4.91497461e+00, 8.06010724e+00],
## [-3.10312201e+01, 2.89972280e+00, -3.54106919e+00],
## [-3.33755630e+01, 5.05559792e-01, -3.42629018e+00],
## [ 2.59707608e+01, -3.13706795e+01, 1.48307551e+00],
## [-1.33326956e+01, -1.34840370e+01, -1.41669537e+00],
## [ 3.25329064e+01, 2.75209448e+01, -1.06768080e+01],
## [ 5.08295622e+00, 1.69073001e+01, -9.35886680e+00],
## [ 1.26869488e+01, 1.63113104e+01, 2.15937219e+00],
## [ 2.87126964e+01, 2.67233147e+01, -1.23022517e+01],
## [ 2.83333711e+01, 2.73889380e+01, -1.47067437e+01],
## [-1.87248958e+01, 3.37135058e+00, 8.01295764e+00],
## [ 4.16061380e+00, 1.63587352e+01, -6.97266717e+00],
## [ 9.37104259e-01, 1.26515117e+01, 6.43989843e-01],
## [-1.43280902e+01, 7.64070582e+00, -2.09372523e-01],
## [-6.67832564e+00, 8.06026434e+00, 6.59212157e+00],
## [ 1.10741886e+01, 1.67133354e+01, -9.06875760e+00],
## [ 2.41474342e+01, 2.19556560e+01, -2.75680668e+00],
## [ 3.40806335e+01, 2.86783729e+01, -1.51475643e+01],
## [-6.27403715e+00, 8.80971097e+00, 4.76128728e+00],
## [-5.11182428e+01, -4.14346279e+00, -4.63620108e+00],
## [ 1.19422883e+01, 1.17202085e+01, 1.78401855e-01],
## [-2.85488637e+01, 1.10895200e+00, 4.46923422e+00],
## [ 1.65722821e+01, 2.28657136e+01, -1.34839641e+01],
## [-7.27605326e+00, 5.44360988e+00, 1.38424372e+01],
## [-1.39481680e+00, 1.23946820e+01, -1.24161470e+00],
## [-3.57611424e+01, -6.14932103e-01, -3.47914691e+00],
## [-5.56824383e+01, -6.04647809e+00, -5.37405177e+00],
## [ 3.47074660e+00, 1.29911907e+01, 2.44190262e+00],
## [ 2.71556995e+01, 1.96728294e+01, 7.28986346e+00],
## [-4.90814557e+00, 5.98150833e+00, 1.47189450e+01],
## [ 1.50797139e+01, 2.04361696e+01, -7.74062232e+00],
## [-3.35892890e+01, -5.33721713e-01, 4.10237348e+00],
## [-2.41064512e+01, 6.47094447e-01, 5.46649319e+00],
## [ 2.27965033e+01, 1.26003007e+01, 2.23069595e+01],
## [-1.79844163e+01, 1.89744866e+00, 2.71072537e+00],
## [ 2.45696088e+01, 1.08958074e+01, 1.91126609e+01],
## [-9.95892169e+00, 2.25634243e+00, 1.15296709e+01],
## [-5.96819488e+01, -6.46094652e+00, -5.55604078e+00],
## [-8.15173790e+00, 2.46796346e+00, 8.11635059e+00],
## [-4.55195527e+01, -3.24381160e+00, -3.13721279e-01],
## [-6.99522665e+00, 5.78541242e+00, 9.41093783e+00],
## [-6.69463910e-01, 1.19931334e+01, 1.08273538e+00],
## [-3.84793525e+01, -2.18453805e+00, 1.20183059e+00],
## [-2.57715362e+01, 1.66150707e+00, 8.99490548e-01],
## [-5.93380706e+01, -6.38283529e+00, -5.42876156e+00],
## [ 2.92293945e+01, 2.56917453e+01, -8.65077675e+00],
## [ 3.21168707e+01, 2.69137463e+01, -9.28678091e+00],
## [ 2.47252118e+01, 2.29483437e+01, -5.13728827e+00],
## [-3.73078885e+01, -6.71292902e-01, -2.68768956e+00],
## [ 1.39035934e+01, 1.42152084e+01, 6.18553147e-02],
## [-5.29068579e+01, -4.87493535e+00, -3.18968938e+00],
## [ 2.93172116e+00, 1.57497168e+01, -6.74546629e+00],
## [-2.70278543e+01, 9.56773991e-01, 6.53113388e+00],
## [ 2.24329044e+01, 2.32214897e+01, -8.37666016e+00],
## [ 2.09432786e+01, 1.75701575e+01, -8.95448349e-01],
## [ 2.44632082e+01, 1.57430648e+01, 1.62863490e+01],
## [-3.40531658e+01, 3.77589716e-03, 8.18941401e-01],
## [ 2.54995237e+01, 1.00880767e+01, 4.68107012e+00],
## [ 2.76454652e+01, 7.14555292e+00, 1.22189784e+01],
## [-1.36435809e+01, -8.25282048e-01, 6.08001220e+00],
## [-1.05741367e+01, 3.25782865e+00, 1.18861790e+01],
## [-2.36446118e+01, 2.85658939e+00, 4.37644447e+00],
## [-2.03505891e+00, 8.86074493e+00, 2.50492299e+00],
## [ 2.48916043e+01, 5.26479612e+00, 2.19172186e+01],
## [ 9.43624467e+00, 1.10638786e+01, 1.45356493e+01],
## [-4.98375574e+01, -4.81551057e+00, -2.21452643e+00],
## [-1.41490168e+01, 3.88233920e+00, 1.12983434e+01],
## [-4.85321677e-01, 6.87984709e+00, 9.64306554e+00],
## [-4.10150386e+01, -1.00758880e+00, -2.67796855e+00],
## [-2.17791323e+01, 6.71995959e+00, -7.12694857e+00],
## [-3.30736906e+01, -1.23691457e+00, 1.62360652e-01],
## [ 2.73665963e+01, 2.04438118e+01, 8.88237997e-01],
## [ 2.97328321e+01, 2.34341344e+01, -1.32072701e+00],
## [-3.91878366e+01, -1.58547472e+00, 1.36738561e+00]])
archetypes_wagner = AA.alfa
pd.DataFrame(AA.archetypes).to_csv("./outputs/Coordinates_3PCs_Archetypes_Wagner.csv")
pd.DataFrame(archetypes_wagner).to_csv("./outputs/Archetypes_PCA_Wagner.csv")
ArchetypesBCWagner <- as_tibble(read.csv("./outputs/Archetypes_PCA_Wagner.csv",header=TRUE))
ArchetypesBCWagner<- ArchetypesBCWagner%>%column_to_rownames(var = "X")
rownames(ArchetypesBCWagner) <- c("arch3", "arch1","arch2","arch4")#c("arch1", "arch2","arch3","arch4")
## Checking that the sum is equal to 1 : probability to belong to one of the archetypes
#colSums(archWPCA)
colnames(ArchetypesBCWagner) <- rownames(CellsPropBC2)
ArchetypesCoord.3PCA <- as.matrix(read.csv("./outputs/Coordinates_3PCs_Archetypes_Wagner.csv", header=TRUE))
ArchetypesCoord.3PCA <- ArchetypesCoord.3PCA[,-1]
colnames(ArchetypesCoord.3PCA) <- c("arch3", "arch1","arch2","arch4")
gradeBC <- MetaData[MetaData$Patient %in% rownames(CellsPropBC2),"Ki-67 percent positive cells by IHC"]
fig2 <- plot_ly(x = scBCPCA$li$Axis1,
y = scBCPCA$li$Axis2,
z = scBCPCA$li$Axis3,
#color = as.numeric(gradeBC$`Ki-67 percent positive cells by IHC`),#~CellsPropBC2$`CD4-T`,
showlegend=TRUE,
type = "scatter3d", mode = "markers",
marker = list(color = toRGB("steelblue",0.7) ,symbol = "triangle",size = 6),#
mode = "text",
name= "tumor")
fig2 <- fig2 %>% layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3") ),
title = "PCA on cell proportions space of BC tumors")
fig2 <- fig2 %>% add_trace(x = ArchetypesCoord.3PCA[1,],
y = ArchetypesCoord.3PCA[2,],
z = ArchetypesCoord.3PCA[3,],
type = "scatter3d",mode = "lines+markers+text",
text = c("arch. 3", "arch. 1","arch. 2","arch. 4"),#c("arch. 2", "arch. 1","arch. 3","arch. 4"),
textposition = c('top right','bottom right','top left','top right'),
textfont = list(color = '#000000', size = 16),
showlegend = TRUE,
name = "Archetypes",
marker = list(color = c("gold","dodgerblue","darkmagenta","black"),symbol = "star-diamond",size = 14),
line = list(color = "black", size = 5),
inherit = FALSE)%>%
add_trace(x = ArchetypesCoord.3PCA[1,c(2,4,1,3)],
y = ArchetypesCoord.3PCA[2,c(2,4,1,3)],
z = ArchetypesCoord.3PCA[3,c(2,4,1,3)],
type = "scatter3d",mode = "lines",
line = list(color = "black", size = 5),
showlegend = FALSE,
inherit = FALSE)
fig2
#library(plotly)
#orca_available()
#orca(fig2, "./figs/fig3DWagner1.pdf",debug=TRUE)
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
# de-correlate the cells using sample() as a column-wise operation (for each cell type) + re-normalize each sample so that it sums up to 100
# For each randomization, we perform an unscaled, centered PCA on the data
# We then collect the list of eigenvalues for each principal component
# A cumulative sum of variance explained is then computed (in %)
# The results are plotted in a figure with the mean of those shuffled +/- 2 sd
scBCPCA <- dudi.pca(df = CellsPropBC2,scale= TRUE, center=TRUE, scannf= FALSE, nf = length(colnames(CellsPropBC2))-1)#TOREMOVE after test
#library(fgpt)
nbShuffles <- 100
nPC <- length(colnames(CellsPropBC2)) -1
EigenVals.rand <- matrix(nrow = nbShuffles,ncol = nPC)
set.seed(5) # for reproducibility
for (i in 1:nbShuffles){
CellsPropBC.rand <- apply(CellsPropBC2,MARGIN = 2,function(x) sample(x,size = length(x),replace = FALSE)) ## Column-wise operation ==> MARGIN=2 #apply(CellsPropBC2,MARGIN = 2,function(x) fyshuffle(x))
sumsSamples <- apply(CellsPropBC.rand,MARGIN=1,function(x) sum(x))
CellsPropBC.rand2 <- (CellsPropBC.rand/sumsSamples)*100
eigs <- dudi.pca(CellsPropBC.rand2,scale = TRUE,center = TRUE,nf = 3, scannf = FALSE)$eig
#print(length(eigs))
EigenVals.rand[i,] <- cumsum(eigs/sum(eigs)*100)
#eigs <- dudi.pca(CellsPropBC.rand2,scale = FALSE,center = TRUE,nf = 3, scannf = FALSE)$
}
#### Get mean and +/- sd of the eigen values from shuffles (100 shuffles)
scBC.rand.CumEigs.mean <- apply(EigenVals.rand,MARGIN = 2,function(x) mean(x))
scBC.rand.CumEigs.sdUp <- apply(EigenVals.rand,MARGIN = 2,function(x) mean(x) + 2*sd(x))
scBC.rand.CumEigs.sdLow <- apply(EigenVals.rand,MARGIN = 2,function(x) mean(x) - 2*sd(x))
scBC.CumEigs <- cumsum(scBCPCA$eig/sum(scBCPCA$eig)*100) # Eigen vlaues of PCA from real tumors (CellsPropBC2 abundance matrix as input)
#Correlation map after randomization
CellsPropBC2.2 <- CellsPropBC2
colnames(CellsPropBC2.2) <- set_cells_names(colnames(CellsPropBC2.2))
heatmap.2(cor(CellsPropBC2.2),Colv = NA,Rowv = NA,dendrogram = "none",
key = FALSE,margins = c(4,6),#srtCol = 45,srtRow = 45,
lmat = rbind(c(0,0,0),c(0,1,0),c(0,1,0)),lhei=c(0.25,1.5,0.75),lwid=c(0.5,1.5,0.5),#rbind(c(3,4),c(2,1)),#lhei = 1.5,lwid = 1.5,#lmat = rbind(c(5,3,4), c(2,1,4)),lhei = c(1.5, 4),lwid = c(1.5, 4, 0.75),
density.info = "none",trace = "none",col = viridis(12))
heatmap(cor(CellsPropBC.rand2,method = "pearson"),Colv = NA,Rowv = NA,col = viridis(12))
nbComp <- seq(1,nPC,1)
CumEigs.rand <- as_tibble(cbind(scBC.rand.CumEigs.sdUp,scBC.rand.CumEigs.sdLow))%>%
mutate(nbComp = nbComp)
CumEigs <- as_tibble(cbind(scBC.CumEigs,scBC.rand.CumEigs.mean))%>%
mutate(nbComp = nbComp)
CumEigs <- CumEigs%>%
dplyr::rename(`real \ntumors` = scBC.CumEigs)%>%
dplyr::rename(`shuffled \ntumors` = scBC.rand.CumEigs.mean)%>%
pivot_longer(cols = c(`real \ntumors`, `shuffled \ntumors`),
names_to = "groups",
values_to = "value")
ggplot()+
geom_line(data = CumEigs,
aes(x = nbComp,y = value,
group = groups,color = groups,
linetype = groups))+
scale_linetype_manual(values = c("solid", "twodash"))+
scale_color_manual(values = c("black","red"))+
scale_size_manual(values = c(2, 2))+
geom_vline(xintercept = 3, linetype = "dotted",
color = "black", size = 0.8)+
labs(x = "Number of Principal Components (PCs)",
y = "% of variance in cellular \ncomposition of tumors")+
theme(legend.title = element_blank())+
geom_ribbon(data = CumEigs.rand, aes(x = nbComp,
ymin = scBC.rand.CumEigs.sdLow,
ymax = scBC.rand.CumEigs.sdUp),
fill = "indianred2",alpha = 0.3)
# ggsave("./figs/EigsRandWagner2.pdf", height = 3, width = 4)
# ggplot()+
# geom_line(data = as_tibble(scBC.CumEigs),
# aes(x = nbComp,y = value))+
# scale_linetype_manual(values ="solid")+
# scale_color_manual(values = "black")+
# scale_size_manual(values = c(2, 2))+
# geom_vline(xintercept = 3, linetype = "dotted",
# color = "black", size = 0.8)+
# labs(x="number of principal components (PCs)",
# y="% of variance in cellular \ncomposition of tumors")+
# theme(legend.title = element_blank())#+
# ggsave("./figs/EigsWagner.pdf", height = 3, width = 4)
### PCA
#figRand <- plot_ly(x = scBCPCA.rand$li$Axis1,
# y = scBCPCA.rand$li$Axis2,
# z = scBCPCA.rand$li$Axis3,
# type = "scatter3d", mode = "markers",
# marker = list(symbol = "triangle",size = 4),
# mode = "text")
#figRand <- figRand %>% layout(scene = list(xaxis = list(title = "PC1"), yaxis = list(title = "PC2"), zaxis = list(title = "PC3") ),
# title = "PCA on cell proportions space of randomized BC tumors")
#figRand
library(reticulate)
nbShuffles <- 10
nPC <- length(colnames(CellsPropBC2)) - 1
#EigenVals.rand <- matrix(nrow = nbShuffles,ncol = nPC)
set.seed(25) # for reproducibility
randomData <- data.frame(shuffle_idx=integer(),archetype=character(),PC1 = double(), PC2 = double())
# data1 = sys.argv[1]
# evs = sys.argv[2]
# means = sys.argv[3]
# nbArchs = sys.argv[4]
# outputPath1 = sys.argv[5]
nbArch <-3
for (i in 1:nbShuffles){
CellsPropBC.rand <- apply(CellsPropBC2,MARGIN = 2,function(x) sample(x,size = length(x),replace = FALSE)) ## Column-wise operation ==> MARGIN=2 #apply(CellsPropBC2,MARGIN = 2,function(x) fyshuffle(x))
sumsSamples <- apply(CellsPropBC.rand,MARGIN = 1,function(x) sum(x))
CellsPropBC.rand2 <- (CellsPropBC.rand/sumsSamples)*100
pcaData <- dudi.pca(CellsPropBC.rand2,scale = FALSE,center = TRUE,nf = 2, scannf = FALSE)
randData <- as.matrix(pcaData$li[,1:2])
eigVects <- as.matrix(pcaData$c1[,1:2])
means <- as.matrix(pcaData$cent)
#system("python3 /scratch/anissa.el/ImmuneStates/wagner_analysis/scripts/python/archetype_analysis.py 'randData' 'eigVects' 'means' 3 '/scratch/anissa.el/ImmuneStates/wagner_analysis/outputs/' ",wait = FALSE)
#py_run_file("/scratch/anissa.el/ImmuneStates/wagner_analysis/scripts/python/archetype_analysis.py", local = FALSE, convert = TRUE)
py_run_string("import numpy as np")
py_run_string("import sys")
py_run_string("sys.path.insert(1, './TMENS_analysis/src/utils')")
py_run_string("from archetypes import ArchetypalAnalysis")
py_run_string("import pandas as pd")
#py_run_string("pc3dWagner =np.array(randData, dtype = 'float64')")
py_run_string("def compute_archetypes(randData,nbArch,eigVects,means):
pc3dWagner = randData
AA = ArchetypalAnalysis(n_archetypes = nbArch,
tolerance = 0.001,
max_iter = 200,
random_state = 0,
C = 0.0001,
initialize = 'random',
redundancy_try = 30)
AA.fit_transform(pc3dWagner)
archsCellsSpace = np.dot(AA.archetypes.T,eigVects.T) + means.T
return archsCellsSpace.T")
#py_run_string("AA.fit_transform(pc3dWagner)")
#py_run_string("archsCellsSpace = np.dot(AA.archetypes.T,eigVects) + np.array(means,dtype='float64')")
ArchsCoords1 <- (py$compute_archetypes(randData,as.integer(3),eigVects,means))
#print(length(eigs))
#EigenVals.rand[i,]<- cumsum(eigs/sum(eigs)*100)
ArchsProj <- scale(t(ArchsCoords1),scale=FALSE,center=scBCPCA$cent)%*%as.matrix(scBCPCA$c1[,1:2])
#Archs <- read.csv("/scratch/anissa.el/ImmuneStates/wagner_analysis/outputs/shuffled_data_archetypes.csv")
#shuffledData <- data.frame(cell_type = colnames(CellsPropBC2),shuffle_idx = rep(i,length(colnames(CellsPropBC2))), mean_cells = means,PC1 = eigVects[,1], PC2 = eigVects[,2],arch1 = ArchsCoords1[,1] ,arch2 =ArchsCoords1[,2] ,arch3 =ArchsCoords1[,3],origPC1 = scBCPCA$c1[,1],origPC2 = scBCPCA$c1[,2], origMeans = scBCPCA$cent)
shuffledData <- data.frame(shuffle_idx = rep(i,nbArch),archetype = c("ARCH1","ARCH2","ARCH3"),PC1 = ArchsProj[,1],PC2=ArchsProj[,2])
randomData <- rbind(randomData,shuffledData)
}
### Projection onto the original PC space
#origpC <- scBCPCA$c1[,1:2]
#origMeans <- scBCPCA$cent
rownames(ArchetypesCoord.3PCA) <- c("PC1","PC2","PC3")
ggplot()+
geom_point(data=as_tibble(scBCPCA$li,rownames=NA),aes(x=Axis1,y=Axis2),color="red")+
scale_color_manual(name="Original tumors")+
geom_point(data =as_tibble(t(ArchetypesCoord.3PCA[1:2,c("arch1","arch2","arch3")]),rownames=NA),aes(x = PC1,y = PC2))+
geom_polygon(data =as_tibble(t(ArchetypesCoord.3PCA[1:2,c("arch1","arch2","arch3")]),rownames=NA),aes(x = PC1,y = PC2),color="black",fill=NA)+
geom_point(data=randomData,aes(x=PC1,y=PC2,group=factor(shuffle_idx)),color="grey")+
geom_polygon(data=randomData,aes(x=PC1,y=PC2,group=factor(shuffle_idx)),color="grey", fill=NA)+
labs(group="archetypes (shuffled tumors)")
CellsProp.archetypes <- compute_cells_prop_archetypes(ArchetypesCoord.3PCA,eigenVectPCA = as.matrix(scBCPCA$c1[,1:3]),meansPCA = scBCPCA$cent)
CellsProp.archetypes <- as_tibble(t(CellsProp.archetypes),rownames = NA)%>%
rownames_to_column(var = "archetype")%>%
pivot_longer(cols = colnames(CellsPropBC2),
names_to = "cell_type",values_to = "proportion")
## Change the labels of cell types to make it more readable :)
CellsProp.archetypes <- CellsProp.archetypes%>%
mutate(cell_type = set_cells_names(cell_type))%>%
mutate(archetype = str_replace(archetype,pattern = "h",replacement = "h. "))
## Bar plot of cells proportions for each archetype
barplot1 <- ggplot(data = CellsProp.archetypes, aes(x = cell_type, y = proportion,fill = archetype)) +
geom_bar(stat = "identity",position = position_dodge(),width = 0.6) +
scale_fill_manual("Archetype", values = c("arch. 1" = "dodgerblue",
"arch. 2" = "darkmagenta",
"arch. 3" = "gold",
"arch. 4" = "black"))+
#theme(axis.text.x = element_text(angle = 35,hjust = 1.2,vjust = 0.9))+
theme(axis.text.x = element_text(angle = 90, vjust = .2))+
xlab ("") + ylab("% tumor cellular composition")
barplot1
ggsave("./figs/barplotArchetypes.pdf",barplot1, height = 3, width = 4)
Scatter plots of cells that are correlated and anti-correlated:
library(sfsmisc)
##
## Attaching package: 'sfsmisc'
## The following object is masked from 'package:dplyr':
##
## last
#cor.test(CellsPropBC2$`CD4-T`,CellsPropBC2$`CD8-T`)
ggplot(data = CellsPropBC2, mapping = aes(x = `CD4-T`,y = `CD8-T`))+
geom_point()+
labs(x = "% CD4 T",y = "% CD8 T")+
stat_cor(
method = "pearson",
alternative = "two.sided",
cor.coef.name = "R",
aes(label=paste(..r.label..,"\n p ==" ,pretty10exp(..p..,digits = 3),sep="~")))
ggsave("./figs/plotCD8vsCD4.pdf", height=3, width=4)
ggplot(data = CellsPropBC2, mapping = aes(x = `CD8-T`,y = EpithelialCells))+
geom_point()+
labs(x = "% CD8 T",y = "% cancer cells")+
stat_cor(
method = "pearson",
alternative = "two.sided",
cor.coef.name = "R",
aes(label=paste(..r.label..,"\n p ==" ,pretty10exp(..p..,digits = 3),sep="~")),
label.x.npc = "middle",
label.y.npc = "top")
ggsave("./figs/plotCD8vsCancer.pdf", height=3, width=4)
Let’s check the healthy tissues. We already know that there must not as immune infiltration as high as observed in the hot tumors, but It is interesting to see the landscape of healthy tissue as a basis comparison with TME :
We can see that healthy tissues have already around 20 % T cells but the distribution in this boxplot is not as wide as It might have been observed in tumor tissue, but here we have only 4 samples, so It is difficult to measure accurately he proportions for healthy tissues in general.
Let’s project normal tissues in the PCA made on BC tumors:
## Projection of the healthy tissue samples in PCA space of BC tumors :
# FIrst we multiply by the eigen vectors of the PCA
# Then , we center around the means of each cell type
## THe resulting matrix has 3 columns, one for each dimension
PC <- as.matrix(scBCPCA$c1[,1:3],ncol = 3)
JTissue_proj <- (scale(JTCellsProps,center = scBCPCA$cent,scale = FALSE) %*% PC)
MTIssue_proj <- (scale(MCellsProps,center = scBCPCA$cent,scale = FALSE) %*% PC)
fig3 <- fig1 %>% add_trace(x = as.vector(JTissue_proj[,1]),
y = as.vector(JTissue_proj[,2]),
z = as.vector(JTissue_proj[,3]),
type = "scatter3d",mode = "markers",
showlegend = TRUE,
name = "Juxta-tumoral tissue",
marker = list(color = toRGB("mediumseagreen",0.7) ,symbol = "star-diamond",size = 6),
inherit = FALSE)
#fig3 <- fig3 %>% add_trace(x = as.vector(MTIssue_proj[,1]),
# y = as.vector(MTIssue_proj[,2]),
# z = as.vector(MTIssue_proj[,3]),
# type = "scatter3d",mode = "markers",
# showlegend = TRUE,
# name = "Mammary healthy tissue",
# marker = list(color = toRGB("indianred",0.85),symbol = "star-diamond",size = 6),
# inherit = FALSE)
fig3 <- fig3 %>% add_trace(x = ArchetypesCoord.3PCA[1,],
y = ArchetypesCoord.3PCA[2,],
z = ArchetypesCoord.3PCA[3,],
type = "scatter3d",mode = "lines+markers+text",
text = c("arch. 3", "arch. 1","arch. 2","arch. 4"),#c("arch. 2", "arch. 1","arch. 3","arch. 4"),
textposition = c('top left','top left','bottom left','bottom right'),
textfont = list(color = '#000000', size = 16),
showlegend = TRUE,
name = "Archetypes",
marker = list(color = "black",symbol = "star-diamond",size = 14),
line = list(color = "black", size = 5),
inherit = FALSE)%>%
add_trace(x = ArchetypesCoord.3PCA[1,c(2,4,1,3)],
y = ArchetypesCoord.3PCA[2,c(2,4,1,3)],
z = ArchetypesCoord.3PCA[3,c(2,4,1,3)],
type = "scatter3d",mode = "lines",
line = list(color = "black", size = 5),
showlegend = FALSE,
inherit = FALSE)
fig3 <- fig3 %>% layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "PCA on cell proportions space of BC tumors")
fig3
CellsBreastTissue.pca <- dudi.pca(AllCellsProps,scale=FALSE, center=TRUE, scannf=FALSE, nf=3)
fviz_eig(CellsBreastTissue.pca)
fviz_pca_biplot(CellsBreastTissue.pca,col.var = 'indianred2',col.ind = 'steelblue2',invisible = 'ind', repel = TRUE)
#fviz_pca_biplot(CellsBreastTissue.pca,axes = c(1,3),col.var = 'indianred2',col.ind = 'steelblue2',invisible = 'ind', repel = TRUE)
## Plot in 3PC space the samples: Tumors+ Mammary tissue + Juxta-tumoral tissue
fig7 <- plot_ly(x = CellsBreastTissue.pca$li$Axis1,
y = CellsBreastTissue.pca$li$Axis2,
z = CellsBreastTissue.pca$li$Axis3,
type = "scatter3d", mode = "markers",
color=~CellsPropBC$Tissue,#AllCellsProps$Fibroblasts,
marker = list(symbol = "triangle",size = 6),
mode = "text")
fig7 <- fig7 %>% layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "PCA on cell proportions space of BC tumors (correlations between cell types)")
fig7
We ask this question in order to know if the cellular abundance is leaded by the expression (or not) of several markers used to classify breast cancers. THose are hormonal markers such as PR, ER and HER2. We would like to know if this classification is relevant enough regarding cellular abundance in the TME.
First, let’s take the 20% closest tumors from each archetype, and build a contingency table of BC subtypes regarding each archetype (archetype x VS the rest and sub-type y VS the rest, etc…)
euclidean_dist_archetype <- function(tumorsCoord, archetypeCoord){
distances <- apply(tumorsCoord,MARGIN = 1, function(x) sqrt(sum((x - archetypeCoord)^2)) )
return(distances)
}
nTopTumors = round(.2 * nrow(scBCPCA$li[,1:3]))
distsA1 <- sort(euclidean_dist_archetype(scBCPCA$li[,1:3],ArchetypesCoord.3PCA[,1]))[1:nTopTumors]
distsA2 <- sort(euclidean_dist_archetype(scBCPCA$li[,1:3],ArchetypesCoord.3PCA[,2]))[1:nTopTumors]
distsA3 <- sort(euclidean_dist_archetype(scBCPCA$li[,1:3],ArchetypesCoord.3PCA[,3]))[1:nTopTumors]
distsA4 <- sort(euclidean_dist_archetype(scBCPCA$li[,1:3],ArchetypesCoord.3PCA[,4]))[1:nTopTumors]
#rm(dists)
#sqrt(sum((scBCPCA$li[1,] -ArchetypesCoord.3PCA)^2))
#names(distsA1)
#split(distsA1,f=names(distsA1))
distsA1 <- as_tibble(data.frame(distsA1),rownames=NA)%>%rownames_to_column(var="Patient")
distsA2 <- as_tibble(data.frame(distsA2),rownames=NA)%>%rownames_to_column(var="Patient")
distsA3 <- as_tibble(data.frame(distsA3),rownames=NA)%>%rownames_to_column(var="Patient")
distsA4 <- as_tibble(data.frame(distsA4),rownames=NA)%>%rownames_to_column(var="Patient")
PatientsArchs <- left_join(StatusTumor,distsA1,by = "Patient")
PatientsArchs <- left_join(PatientsArchs,distsA2, by = "Patient")
PatientsArchs <- left_join(PatientsArchs,distsA3, by = "Patient")
PatientsArchs <- left_join(PatientsArchs,distsA4, by = "Patient")%>%mutate(archetype = ifelse(!is.na(distsA1),"ARCH1",ifelse(!is.na(distsA2),"ARCH2",ifelse(!is.na(distsA3),"ARCH3",ifelse(!is.na(distsA4),"ARCH4","None")))))%>%dplyr::select(-c(distsA1,distsA2,distsA3,distsA4))%>%
column_to_rownames(var = "Patient")
ContingencyTableBCArchs<-table(PatientsArchs)
chisq.test(ContingencyTableBCArchs) #no association between archetypes and subtypes
## Warning in chisq.test(ContingencyTableBCArchs): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: ContingencyTableBCArchs
## X-squared = 8.0751, df = 12, p-value = 0.7792
Building contingency tables and doing fisher test to see if there is a non-random association between BC subtype and archetypes found:
## archetype
## Status ARCH1 ARCH2 ARCH3 ARCH4 None
## ER+/PR+ -0.07417759 0.03685372 -0.15906649 -0.32571636 0.11134745
## HER2+ 0.30433403 0.30433403 -0.62855177 1.30433403 -0.67363966
## TN 0.42986492 -1.15509758 1.08194161 0.42986492 -0.54810878
## Unknown -0.01759406 -0.01759406 0.63448264 0.56736844 -0.41060525
## TN ER+/PR+ HER2+ Unknown
## ARCH1 0.6013417 0.3658964 0.6299559 1.0000000
## ARCH2 0.3476501 1.0000000 0.6299559 1.0000000
## ARCH3 0.1500241 0.4508289 1.0000000 0.4740059
## ARCH4 1.0000000 1.0000000 0.3017166 1.0000000
fig4 <- plot_ly(x = scBCPCA$li$Axis1,
y = scBCPCA$li$Axis2,
z = scBCPCA$li$Axis3,
color =~StatusTumor$Status,
opacity=.7,
showlegend=TRUE,
type = "scatter3d", mode = "markers",
marker = list(symbol = "triangle", size = 8),
mode = "text",
name=~StatusTumor$Status)
fig4 <- fig4 %>% layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3") ),
title = "PCA on cell proportions space of BC tumors")
fig4 <- fig4 %>% add_trace(x = ArchetypesCoord.3PCA[1,],
y = ArchetypesCoord.3PCA[2,],
z = ArchetypesCoord.3PCA[3,],
type = "scatter3d",mode = "lines+markers+text",
text = c("arch. 3", "arch. 1","arch. 2","arch. 4"),#c("arch. 2", "arch. 1","arch. 3","arch. 4"),
textposition = c('top right','bottom right','top left','top right'),
textfont = list(color = '#000000', size = 16),
showlegend = TRUE,
name = "Archetypes",
marker = list(color = "black",symbol = "star-diamond",size = 14),
line = list(color = "black", size = 5),
inherit = FALSE)%>%
add_trace(x = ArchetypesCoord.3PCA[1,c(2,4,1,3)],
y = ArchetypesCoord.3PCA[2,c(2,4,1,3)],
z = ArchetypesCoord.3PCA[3,c(2,4,1,3)],
type = "scatter3d",mode = "lines",
line = list(color = "black", size = 5),
showlegend = FALSE,
inherit = FALSE)
fig4
THe healthy tissue is characterized by a high amount of adipocytes that are under the label “Other” in this dataset. We chose to remove it because it is not inside our definition of the TME even though it is well-known that in the TME of breast cancer, it is not rare to find them. Here, we are not oging to completely remove this cell type from our dataset because we still admit that the TME contains helathy tissue cells but in very low abundance. The archetype 3 symbolizes this healthy tissue that maps also with juxta-tumoral tissue.
CellsArch.scBCPCA <- compute_cells_prop_archetypes(ArchetypesCoordPCA = ArchetypesCoord.3PCA,
eigenVectPCA = scBCPCA$c1[,1:3],
meansPCA = scBCPCA$cent)
##FIXME: RENAME ArchCellsProp ==> misleading name !!! cell proportions of each archetype from scBCPCA data
CellsProps.postDiss <- remove_healthy_archetype(ArchCellsProp = CellsArch.scBCPCA,
ArchWeights = ArchetypesBCWagner,
ArchToKeep = c("arch1","arch2","arch4"),
ArchToRemove = "arch3")
CellsProps.postDiss2 <- CellsProps.postDiss + as.matrix(scBCPCA$li[,4:ncol(scBCPCA$li)])%*% t(as.matrix(scBCPCA$c1[,4:ncol(scBCPCA$c1)]))
#remove_healthy_archetype2 <- function(eigenVectPCA,meansPCA,ArchetypesCoordPCA,ArchWeights,ArchToKeep, ArchToRemove){#,eigenVectPCA
## Correction of the archetypes ==> remove the part of archetype 3 (made essentially of adipocytes)
# archWPCAcorr <- matrix(nrow=0,ncol=0) #matrix(nrow=length(ArchToKeep),ncol=ncol(ArchScoresPCA))
# for (i in 1:length(ArchToKeep)){
# archWPCAcorr <- rbind(archWPCAcorr,ArchWeights[ArchToKeep[i],]/(1-ArchWeights[ArchToRemove,]))
# }
# print(dim(archWPCAcorr))
# newCellProps <- scale(t(eigenVectPCA%*%(ArchetypesCoordPCA[,ArchToKeep]%*%as.matrix(archWPCAcorr))),center=-meansPCA,scale=FALSE)
#cellAbundCorrected <- t(as.matrix(Bcorr0) %*%as.matrix(archWPCAcorr))
# return(newCellProps)#(cellAbundCorrected)
#}
#newCellsProps <- remove_healthy_archetype2(eigenVectPCA =as.matrix(scBCPCA$c1[,1:3]) ,
# meansPCA =scBCPCA$cent ,
# ArchetypesCoordPCA =ArchetypesCoord.3PCA ,
# ArchWeights = ArchetypesBCWagner ,
# ArchToKeep = c("arch1","arch2","arch4"),
# ArchToRemove ="arch3" )[ArchetypesBCWagner["arch3",]<.5,]
#We remove tha samples whose percentage of adipocytes was initially(in CellsPropBC2) higher than 50% ==> they might skew our further analyses
#CellsProps.postDiss2 <- CellsProps.postDiss2[CellsPropBC2$Other<50,]
CellsProps.postDiss2 <- CellsProps.postDiss2[ArchetypesBCWagner["arch3",]<.5,]
## Compare cells proportions before and after dissection
# Join two tables : CellsPropBC2 (pre-dissection)
CellsProp.PrePostDiss <- full_join(CellsPropBC2 %>% mutate(status = "before_dissection"),
as_tibble(CellsProps.postDiss2) %>% mutate(status = "post_dissection"))
## Joining, by = c("EndothelialCells", "EpithelialCells", "Fibroblasts", "Other",
## "NK", "BCells", "PlasmaCells", "DC", "CD4-T", "CD8-T", "Macrophages",
## "OtherImmune", "status")
CellsProp.PrePostDiss2 <- CellsProp.PrePostDiss%>%
pivot_longer(cols = c(EndothelialCells,EpithelialCells,Fibroblasts,Other,NK,BCells,PlasmaCells,DC,`CD4-T`,`CD8-T`,Macrophages,OtherImmune),
names_to = "cell_type",values_to = "proportion")
ggplot(data = CellsProp.PrePostDiss2,
aes(x = status,y = proportion,color = cell_type,group = status))+
geom_boxplot()+
facet_wrap(~cell_type)+
theme(axis.text.x = element_text(angle = 45,hjust = 0.8,vjust = 0.5))+
labs(title = "Boxplot of proportions of cell types from BC samples pre and post dissection")+
xlab (" ") + ylab("proportion")
Even after having discarded the samples containing high amounts of Other cells, this cell type still contributes the most to the PCA and so It hides the effect on the variation of many less abundant cell types. Of course, we would know for example that Tregs and maybe dendritic cells might not be a big axis on our PCA as we know that we can find them on low concentrations in the organism but some immune cells can be very abundant such as B cells that are the factory of antibodies and macrophages that can highly infiltrate the TME either to trigger tumor suppression, either to enhance Its survival. But those known effects are hidden. Nonetheless, we shall not disconsider the effect of those adipocytes and other cells from our analyses.
As we already know, the tumor uses every external resources to survive and the pressure of evolution can give an advantage to those who are able to hijack the resident cells. For example, adipocytes in the breast tissue synthesize some lipids that are critical for survival of the surrounding cells. SOme cancer cells can use them as nutrients which can make adipocytes slowly dying. Can we consider that those cells are part of the TME ? Because they were in this tissue at first, they haven’t been recruited by any system, but somehow we can consider that they contribute to the development of Tumor as vasculature for example.
This analysis shows that the definition of the TME is quite sensitive as the tumor uses a lot of resources in order to survive and evade the immune system. We kept the tumor samples that had less than 50 % of adipocytes which seems to be a fair threshold regarding the poor amount of tumor samples.
This analysis shows also the question of using the right mathematical tool to test if building blocks can describe the macro-architecture of a tumor.